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Abstract. A rooted acyclic digraph N with labelled leaves displays a tree T 
when there exists a way to select a unique parent of each hybrid vertex resulting 
in the tree T. Let Tr(N) denote the set of all trees displayed by the network N. 
In general, there may be many other networks M such that Tr(M) = Tr(N). 
A network is regular if it is isomorphic with its cover digraph. This paper shows 
that if N is regular, there is a procedure to reconstruct N given Tr(N). Hence 
if N and M are regular networks and Tr(N) = Tr(M), it follows that N = M, 
proving that a regular network is uniquely determined by its displayed trees. 

1 Introduction 

It has become common, for a given collection X of taxa and given a particu- 
lar gene g, to use phylogenetic methods to determine a phylogenetic tree T 9 . 
The extant taxa correspond to leaves of the trees, while internal vertices corre- 
spond to ancestral species. The arcs correspond to genetic change, typically by 
mutations in the DNA such as substitutions, insertions, and deletions. Com- 
mon methods for determining the trees include maximum likelihood, maximum 
parsimony, and neighbor-joining, but many other methods are also utilized. 

Commonly the use of a different gene h for the same collection X of taxa 
results in a tree T h that differs from T 9 . Indeed, many different trees arise 
for different genes g but the same X. For example [TJ] utilized 106 orthologs 
common to seven species of yeast and an outgroup. The collection of 106 
maximum-parsimony trees and 106 maximum-likelihood trees included more 
than 20 different robustly supported topologies. While [TJ] concatenated the 
data to try to achieve resolution, [9] employed consensus networks to display 
the incompatibilities that existed among the trees. 

One hypothesis to explain the deviations of such gene trees from a single 
"species tree" is to assume "lineage sorting". In this model a single species 
tree is seen as a kind of pipeline containing populations with significant genetic 



diversity; the genes actually fixate at locations that need not coincide with the 
speciation events in the species tree. Hence the genes do not necessarily follow 
the species tree. Coalescence methods such as [TS], 0, [IB] utilize this approach. 
For example, [7] shows that the most likely gene tree need not coincide with 
the species tree. Much of the resulting diversity, however, makes use of short 
branch-lengths separating some speciation events in the species tree. 

Another hypothesis to explain the deviations of such gene trees from a sin- 
gle "species tree" is to assume that evolution actually occurs on networks that 
are not necessarily trees. Besides mutation events, these networks could include 
such additional reticulation events as hybridization or lateral gene transfer. Gen- 
eral frameworks are discussed in pQ, [2J, [T2J, and [T3]- 

Even if the underlying species relationships are given by a network, the 
evolution of an individual gene might best be described by a tree. The idea is 
that, at a hybridization event, some genes would be inherited from one parent 
species, and other genes from another parent species. Suppose, for example, the 
underlying species network is M in Figure 1. Species 2 is hybrid with parental 
species B and C. If a particular gene in 2 is inherited from B, then the correct 
description of the inheritance of that gene would be tree b in Figure 2. If instead 
a gene in 2 is inherited from C, then the correct description for that gene would 
be tree c in Figure 2. Thus we would expect to see both trees b and c among 
the various gene trees. Trees b and c are said to be displayed by the network. 
On the other hand, tree d in Figure 2 is not displayed by M, so we would not 
expect a gene to evolve according to d under these assumptions. 




The assumption that the underlying description of evolutionary history is a 
network rather than a tree raises the fundamental problem of reconstructing a 
network from data. Suppose that a collection of gene trees for the same set X 
of taxa is known. Can the underlying network be uniquely reconstructed? 

If M is a network, let Tr(M) denote the set of rooted trees displayed by M. 
Figure 1 shows two distinct networks M and N such that Tr(M) = Tr(N) = 
{b,c}. This example represents a common situation. In general there may be 
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Figure 2: Some trees related to Figure 1. Both M and N display trees b and c 
but not d. 



many networks that display exactly the same trees. 

One approach has been to seek a network that displays a collection of trees 
and which has the fewest hybridization events. This problem was proved to be 
NP-hard [4]. Various special cases with additional hypotheses on the networks 
have also been studied, such as [TS], [S], [TU], [TT] , 

A different approach has been to make assumptions on the properties of 
an allowable phylogenetic network. It would be desirable to have a class of 
phylogenetic networks which is biologically plausible and such that there is often 
a uniquely determined network of this type with certain observable properties. 
It is commonly assumed that the networks are rooted acyclic digraphs [T7] , [H] , 
[T3] . Restrictions that appear tractable and yield interesting results include time 
consistency [12], [6], roughly that the parents of a hybrid be contemporaneous. 
Others include restrictions on the children of vertices, for example tree-child 
networks [5] or tree-sibling networks [6]. Certain unique reconstructions for 
"normal" networks are given in |19j . 

Baroni and Steel [3] defined the notion of a regular network. The precise 
definition is given in Section 2. The basic idea is as follows: The cluster cl(v) of 
a vertex v is the set of leaves which are descendents of v. In a regular network, 
no two distinct vertices have the same cluster. Moreover, cl(u) C cl(v) iff there 
is a directed path from v to u. In Figure 1, M is regular but ./V is not since in 
N cl(D) = cl{E) = cl(F) = {1, 2, 3}. 

The main result of this paper is Theorem 3.1. This theorem gives a method 
which, given Tr{M) for a regular network M uniquely reconstructs the net- 
work M. Corollary 3.2 asserts the consequence that if M and N are regular 
networks with the same leaves and Tr(M) — Tr(N), then M = N. Thus the 
entire collection of trees displayed by a regular network uniquely determines the 
network. 
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Figure 1 shows that without the assumption of regularity, the network is not 
uniquely determined by the set of its displayed trees. Another example is given 
in Section 5. 

The proof of Theorem 3.1 is constructive. A procedure MaximumProper- 
Child is applied to the input T> = Tr(N). When N is regular, the procedure 
outputs the network N up to isomorphism. (In fact it outputs the cover digraph 
of N, see [3].) An example is worked in section 4, illustrating the procedure. 
Also in section 4, we observe that the input need not be T> — Tr(N) but instead 
might be an appropriate subset of Tr(N). Characterizing this subset, however, 
remains an open problem. 

It is not likely in a real biological problem that all the trees displayed by a 
network are known. The number of such trees could easily grow exponentially 
with the number of leaves. Hence the main theorem is primarily of theoretical 
interest: For any two distinct regular phylogenetic networks there must exist a 
tree displayed by one but not the other. 

This situation contrasts with that in which the generalized clusters or tree 
clusters of a network are given instead of all the displayed trees. For a net- 
work N, a generalized or tree cluster is any cluster of any tree T displayed 
by N. The set of all tree clusters of N is denoted TrCl(N). The paper [T5] 
presents examples of distinct regular networks (indeed normal networks) M and 
N with the same leaf sets and which have precisely the same tree clusters; thus 
TrCl(M) = TrCl(N) but M and N are not isomorphic. The author there- 
fore finds it somewhat surprising that, as shown in the current paper, the trees 
themselves do determine the network uniquely for a broad class of networks. 

2 Basics 

A directed graph or digraph N = (V,A) consists of a finite set V = V(N) of 
vertices and a finite set A = A(N) of arcs, each consisting of an ordered pair 
(u,v) where u G V, v € V, u ^ v, interpreted as an arrow from u (the parent) 
to v (the child). There are no multiple arcs and no loops. A directed path is 
a sequence uq, u\, ■ ■ ■ ,Uk of vertices such that for i = 1, • • • , k, (ttj-i, Ui) G A. 
The length of the path is k and the path is trivial if k = 0. The graph is acyclic 
if there is no nontrivial directed path starting and ending at the same point. 
Write u <n v or more informally u < v in N if there is a directed path starting 
at u and ending at v. Write u < v if u < v and u =/= v. If the graph is acyclic, 
it is easy to see that < is a partial order on V . 

A vertex r is a root of the directed acyclic graph (V, A) if, for all v G V, 
r < v. The network is rooted if it has a root. Clearly there can be at most one 
root. 

The indegree of vertex u is the number of v G V such that (v, u) G A. The 
outdegree of u is the number of v G V such that (u, v) G A. UN is rooted at 
r then r is the only vertex of indegree 0. A leaf is a vertex of outdegree 0. A 
normal (or tree) vertex is a vertex of indegree at most 1. A hybrid vertex (or 
recombination vertex or reticulation node) is a vertex of indegree at least 2. 
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Let A be a set. The cardinality of A will be denoted \X\. In biological terms 
we consider the members of X to be a specific collection of biological species. We 
call X the base-set of the directed graph N = (V, A) if there is a given one-to- 
one relationship between X and the subset L(N) C V consisting of the leaves of 
N. Thus we identify the leaves of N with the members of X. The interpretation 
of X is that its members correspond to taxa on which direct measurements may 
be made, while A describes a proposed evolutionary history giving rise to these 
taxa. The leaves correspond to extant taxa so direct measurements are possible. 
Typically one taxon is included which is an outgroup — -an extant species clearly 
on a separate evolutionary track from all other taxa. Hence the root is located 
as the attachment vertex of the outgroup taxon. 

In this paper a (phylogenetic) network N = (V, A, r, X) is an acyclic digraph 
(V,A) with root r and base-set X. Two networks N = (V, A,r, X) and M — 
(V , A' ,r' , X) are isomorphic, N = M, iff there is a bijection (j> : V — * V such 
that for all x G X, 4>{x) = x, and (u, v) G A iff (4>(u), (j>(v)) G A'. 

Let N = (V, A, r, X) be a phylogenetic network. Let V(X) denote the set of 
all subsets of X. For «eV. define the (full) cluster of v in A by cl(v, A) = {x G 
X : v < x}. It is clear that for each v G V, cl(v,N) G V{X). Define for each 
phylogenetic network N with base-set X, cijy : V — ► V{X) by cIn(v) — cl(v, N). 

The following properties of the clusters are basic: 

(1) For oEV, cIn{v) is nonempty. 

(2) If it <jv then cIn{v) C cIn(u). 
{3)cl N (r)=X. 

(4) If x G X, then cZjv(a:) = {a;}. 

Note that (1) follows since a maximal path must end at a leaf and every leaf 
lies in X. Morever (2) follows since <at is a partial order. In particular, if (u, v) 
is an arc of N, then cI_n(v) C cIm{u). Also, (3) follows since for each i £ I, we 
have r < x, and (4) follows since each x £ X satisfies that x is a leaf. 

The clusters X and {x} for x G X are called the trivial clusters since they 
occur in each network. Any other clusters will be called nontrivial. 

Given the network A = (V, A, r, X), we may let C(A) = {cl N (v) : j; £ 7} C 
V(X). The cover digraph of A is the digraph (W, i?) where 

(1) PF = C(A), and 

(2) there is an arc (B, C) e E for B and C in W iff 
(2a) CC-B; and 

(2b) there is no £> G W such that C C D C B. 
Note the root is A = c?Ar(r) because for all a; G X, r < x. Note since the 
members of A are the leaves of A, it follows for each x G A, cIn(x) = {x} so 
the leaves of the cover digraph are the singleton sets {x} for x G A. Hence the 
leaves may be identified with the members of A and the root r with A. 

Baroni and Steel [5| defined a regular network to be a network which is 
isomorphic with its cover digraph. Following is an equivalent description: The 
phylogenetic network A = (V, A, r, A) is regular provided 

(1) cIn ■ V — > V(X) is one-to-one; and 

(2) there is an arc (u,v) G A iff cIn{v) C cIm{u) and there is no it) G V such 
that cIn{v) C cIn(w) C cIn{u). 
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Let TV = (V, A, r, X) be a phylogenctic network. A parent map for TV is a 
map p : V — {r} — > V such that for each ue^t/r, p(v) is a parent of u, i.e., 
(p(v),v) e A. Since the root r is unique, it is clear that if v G V, v ^ r, then w 
has a parent. Note that if v is normal and u/r, then v has exactly one parent 
q, so all parent maps p will satisfy = q. When v is hybrid, however, there 
are at least two parents of v. 

Let Par(N) denote the collection of parent maps for the network TV. Let 
i(v, TV) denote the indegree of v in the network TV. Then the number of distinct 
parent maps is clearly \Par(N)\ = ]J[i{v,N) : v € V, v ^ r]. 

For any parent map p for TV = (V, A, r, X) construct a new network TV p = 
(V,E,r,X) as follows: The vertex set, root, and base-set are the same as for 
TV. The arc set E consists of all arcs of the form (p(v),v) where v £ V, v ^ r. 
Thus E C A. 

Each vertex v other than r has exactly one parent in N p ; i.e., i(v, N p ) = 1. 
Hence TV P is a rooted tree. It is quite possible that v has outdegree 1 as well 
as indegree 1, but such vertices are often suppressed in a rooted tree. We will 
therefore consider two kinds of simplification to change N p into a rooted tree in 
standard form. 

Type 1: Suppress a vertex with outdegree 1. More specifically, if u has 
outdegree 1, say via arc (u, v), then remove tt; remove also each arc (w,u) and 
replace it by arc (w, v). 

Type 2: Suppress a vertex with no directed path to a member of X. More 
specifically, suppose u is such a vertex. Then, delete u, for each arc (v, u) delete 
(v, u), and for each arc (u, v), delete (u, v). 

The result of performing all possible simplifications of Type 1 or Type 2 on 
TV P is denoted T(TV P ), called the standard form of N p . 

Figure 2 exhibits some rooted trees related to Figure 1. In M let the parent 
map p satisfy p{2) = B (and trivially p(l) = B, p(3) = C, p{B) = A, p(C) = A). 
Then the tree M p is given in a. Since C has outdegree 1 in Fig 2a, it is suppressed 
by Type 1, resulting in the standard form b, which is T(M p ). Similarly if p 1 is 
the parent map with p'(2) = C, then Fig 2c shows T(M p i). The tree d is not 
displayed by M since there is no parent map yielding d. 

It is easy to see that TV in Fig 1 also displays b and c. Consider the parent 
map q for TV given by q{2) = H, q(G) = E, q{H) = E. Then N q = e in Figure 
2. We simplify e by suppressing F by Type 2 and then D and G by Type 
1. Hence T(N q ) — c in Figure 2. For both the networks in figure 1, we have 
Tr(M) = Tr(TV) = {b, c} using the notation in Figure 2. 

3 Reconstruction of regular networks from all 
their trees 

Suppose I? is a nonempty collection of rooted trees each with the same base-set 
X. In this section we present a procedure called MaximumProperChild (MPC) 
which constructs a phylogenetic network MPC(2?) given V. The algorithm 
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always terminates with a network. 

The main theorem 3.1 asserts that if T> = Tr(N) for some regular network 
N, then the output of the procedure is N, so N has been reconstructed. 

Theorem 3.1. Suppose N = (V, A, r, X) is a regular phylogenetic network. 
Then the output of MaximumProperChild applied to T> = Tr(N) is isomorphic 
with N; ie., MPC(Tr{N)) = N. 

An immediate consequence of Theorem 3.1 is Cor 3.2, which asserts that the 
set of trees displayed by a regular network uniquely determines the network. 

Corollary 3.2. Suppose M and N are regular phylogenetic networks with base- 
set X. IfTr(M) = Tr(N), then M = N. 

Theorem 3.1 need not be true without the assumption that V includes all the 
trees displayed by N . It is easy to find examples in which N is not reconstructed 
if some trees are missing from V. On the other hand, it is also easy to find 
examples in which V ^ Tr(N) but still MPC{V) = N. What is important is 
that the "right" trees lie in T>. Roughly speaking, the "right" trees are those 
that arise via the use of Lemma 3.3. Further discussion of this point is in Section 
5. 

The number of displayed trees may be exponentially large in \X\, so the 
algorithm need not be polynomial-time in \X\. It is easy to see, however, that 
the procedure is polynomial-time in |X| + \ V\. 

Figure 1 shows that Cor 3.2 fails without the assumption of regularity. 

Here is an overview of the procedure: We reconstruct the network N re- 
cursively by finding the clusters of N. Initially, we have only the root cluster, 
which is X. At any given stage, given a cluster C — cl{u,N) for some ver- 
tex u E V we are able to identify the clusters of all its children in N. To 
do so, by construction we will know already the clusters along a directed path 
X = P„,P„_i, ■■■ ,P = C from X to C. The "proper trees" for C, denoted 
ProperTr(C), will consist of any input trees T exhibiting all the clusters along 
such a directed path already in our reconstruction. We list the clusters for the 
children of C in all the proper trees for C. Among these we consider the set of 
"maximal proper children," denoted MaxProperCh{C), consisting of the clus- 
ters U for children of C in a proper tree, such that there is no other cluster 
W which is a child of C in some proper tree and for which U C W C C. We 
show that these maximal children are necessarily the clusters of children of C 
in TV, and all clusters of the children of C in N arise in this manner. Hence the 
children of C are precisely the members of M ax Proper Ch(C) . We insert the 
members of MaxProperCh(C) into the set of vertices of our reconstruction, 
together with arcs from C to each such vertex; then we continue recursively. 

An example of the procedure will be given in Section 4. 

The following is a precise more formal description: 



Algorithm MaximumProperChild. 

Input: V is a nonempty collection of rooted trees each with the base-set X. 
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Output: a regular phylogenetic network M with base-set X. 

Procedure. We construct a sequence M , Mi, • • • of directed graphs where M k = 

(t4, Ak). Each member of V k is a nonempty subset of X, and Vo C V\ C V2 C 

1. Initially M = (Vo, ^0) with Vo = {^} and A = 0. Thus M has a single 
vertex which is the cluster X. This vertex is not checked off. 

Recursively perform the following step 2: 

2. Suppose Mk = (Vk,Ak) is known and some vertex U £ V k is not checked 

off. 

2a. If T4 contains a singleton set C = {a} which is not checked off, then 
Mk+i = Mk except that {a} has been checked off. 

2b. If Vk contains a doubleton set C = {a, b} which is not checked off, then 
Vfe+i := Vk U {{a}, {6}}, and A k+1 = A k U {({a, b}, {a}), ({a, 6}, {&})}. In M k+1 
check off all members of Vk that were already checked off and in addition check 
off {a, 6}, {a}, and {b} but nothing else. This thus adjoins the two singletons 
{a} and {b}. 

2c. Suppose neither 2a nor 2b applies. Suppose C G Vk has not been 
checked off. Let ProperTr(C) — {T 6 V : C is a cluster of T and there 
is a directed path X = P„,P n _i, ■ • ■ , Pi, Po = G in T such that for each i, 
cl(Pi, T) is a vertex of M fc and each arc (Pj, Pj_i) is an arc of M fc } be the set 
of proper trees for C. Let ProperCh(C) = {D : for some T e ProperTr(C), 
D is a child of C} be the set of children of C in any proper tree for C . Let 
MaxProperCh(C) = {D G ProperCh(C) : there is no D' in ProperCh(C) such 
that D C D' C C} (strict inclusions) be the set of maximal proper children of C. 
For each D € MaxProperCh(C), adjoin to M the vertex D (if it is not already 
present) and the arc (C,D). More explicitly define Vk+i = Vk U {D : D e 
MaxProperCh(C)}. Define A k+ i = A k U {(J),C) : D E MaxProperCh(C)}. 
In Mfc + i check off all vertices checked off in Mk and also check off C but nothing 
else. Note that it is possible that D is already present in Vk, but that this 
construction may still introduce a new arc incoming to D. 

3. The procedure terminates with M„ such that every member of V n has 
been checked off. Return M n . 



It is clear that the procedure always terminates, whether or not T> = Tr(N). 
This is because X is a finite set, so V(X) is finite and there can only be finitely 
many vertices. At the end of 2a, 2b, or 2c an additional vertex is checked off. 
Hence after finitely many steps all vertices must be checked off. 

Moreover, whenever a new vertex D is added in step 2b or 2c, ProperTr(D) 
is nonempty. This is trivially true if D arose as a singleton set in 2b. If D arose 
in 2c, then there exists a parent C of D and T G ProperTr(C). Hence T also 
lies in ProperTr(D). Thus when 2c is applied to C containing at least three 
members of X, it identifies a child D of C which is a nonempty proper subset 
of C. It follows that when the procedure terminates, each singleton set {x} is 
in V n . 

An example is given in the next section. 
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We now turn to the proof of Theorem 3.1. The first step is a lemma which 
identifies a useful tree related to a given directed path in TV. 

Lemma 3.3. Let C be a vertex of TV. Let P n = r, P n -i, • • • ,Pi,Po = C be a 
directed path in TV from the root r to C. There exists a tree T displayed by TV 
in standard form such that, for i = 0, • • • , n, Pi is a vertex of T and we have 
cl(P i ,T)=cl(P i ,N). 

Proof We find a tree T as follows: The parent map p which yields T is selected 
by 

(0) If W is normal, W ^ r, then p(W) is the unique parent of W . 

(1) If H is hybrid and C < H, choose a parent p(H) of H such that C < p(H) 
in TV. 

(2) Suppose n > 1. If H is hybrid and P\ < H, but it is false that C < H, 
choose p(H) such that P\ < p(H) in TV. 

(3) Suppose n > 2. If H is hybrid and P2 < H but it is false that P\ < H 
(hence also false that C < H), then select p(H) such that P2 < p(H) in TV. 
(k) In general, if n > k, H is hybrid, and Pk < H but it is false that Pk-i < H, 
select p(H) such that Pk < p(H) in TV. 

Since P n = r, it follows that for each hybrid H, p(H) will be defined. 

I claim that cl{C, N p ) = cl{C, N). Clearly cl{C, N p ) C cl{C, N). Conversely, 
suppose W is a vertex of N and C < W in N. I will show that C < W in N p . 
It suffices to show that whenever C < W in TV, then there exists a parent P of 
W in N p such that C < P in N. The result is immediate if W has a unique 
parent P in TV because since C < W it follows C < P. If, instead, W is hybrid, 
then by assumption p(W) is a parent of W in N p and by (1) C < p(W) in TV. 
This proves that C < W in TV p if C < W in TV. Now, if a; € cl{C, TV) the choice 
W = x shows, since C < x in TV, that C < x in TV p , whence x £ cl(C, N p ). Thus 
d{C,N p ) = d{C,N). 

Suppose n > 1. I now claim that d(Pi,N p ) — cl(Pi,N). It is immediate 
that cl(P\,N p ) C cl(P\,N). For the converse, suppose x G cl{P\,N). Suppose 
W is a vertex of TV and P\ < W in TV. I show that P\ < W in N p . It suffices to 
show that if Pi < W in TV, then there exists a parent P of Pi in TV p such that 
Pi < P in TV. If C < W in TV, then from above there exists a parent P of W in 
TVp such that C < P in TV, whence Pi < C < P in TV. Hence we may assume 
that C it W in TV. If TK is normal, then its unique parent P must satisfy that 
P < W in TVp (since arcs to normal vertices remain in TV p ) whence Pi < P in 
TV. If instead W is hybrid, then since Pi < W but C -ft W it follows from (2) 
that p(W) satisfies P ± < p(W) in TV. This proves that P 1 < W in TV p if P 1 < W 
in TV. Now, if x G d(P 1; TV) the choice W — x shows, since Pi < x in TV, that 
Pi < x in TV p , whence x G cl{P 1 ,N p ). Thus cl(P 1 ,N p ) = cl{P\, TV). 

The argument can be iterated to show that for i = 0, • • • ,n, cl(Pi,N p ) — 
cl{Pi,N). 

Let T = T(TVp) be the standard form of TV p obtained by suppressing vertices 
of outdegree 1 and vertices with no directed paths to any member of X. By 
regularity of TV, the sets cl(Pi, TV) are distinct for i = 0, • • ■ , n. Hence the sets 
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cl(Pi, N p ) are distinct for i = 0, • • • , n. I claim that P n , ■ ■ ■ , P are vertices of 
T. 

Note first that there exists a directed path in TV of maximal length (number 
of arcs) starting at Pq = C. The path must end at some leaf which consists of a 
member x € X since X contains all the leaves. From (0) and (1) it follows that 
there is a path in N p from C to x as well; otherwise some vertex W on that 
path would satisfy that C < W in N so some parent P of W satisfies C < P, 
but p(W) satisfies that C ^ p{W), contradicting (0) or (1). Hence there is a 
directed path in N p from C to x, whence also a directed path from each Pj to 
x. It follows that no Pi is suppressed because there is no path to a member of 
X. 

Moreover, for i = 1, • • • , n, Pi is a vertex of T; otherwise Pi would have out- 
degree 1 in N p whence d(P, N p ) = d(Pj_i, _/V p ), whence d(P, N) = cZ(P_i, N). 
Moreover, I claim that C = Pq is a vertex of T. The claim is immediate if C is 
a leaf. If C is not a leaf then C has children D\, Z?2, • • • , Dk in N, with k>2. 
By regularity cl(Dj,N) is a proper subset of cl(C,N). If C were not a vertex 
of T, it would have outdegree 1 in N p . Assume its child in N p is D\. Then 
d(C, N) = cl(C, N p ) = cl{D u Np) C d{D u N) C d(C, iV), a contradiction. 

It follows that in T there is a directed path P„ = X, P„_i, • • • , Pi, Po = C 
such that for i = 0, • • • , n, d{P u T) = d{P u N). 

□ 

We now prove theorem 3.1. 

Proof. Let N be a regular network and T> = Tr(N). Let the sequence of net- 
works obtained from MPC be denoted M Q ,Mi,--- ,M n where Mi = (Vi,Ai) 
has the set Vi of vertices and the set A4 of arcs. Initially Vo = {X} and A = 0. 
The proof will be by induction. The i-th inductive hypothesis Hi is that 

(1) For each vertex U of Mj there exists a vertex [/' of TV such that U = 
d{U',N). 

(2) For each arc (U, W) of M u {U 1 , W) is an arc of N. 

(3) For each vertex U of Mi that has at least one child in Mj, for every child 
y of U' in iV, there exists a vertex IF of Mi such that FF is a child of J7 in Mi 
and IF' = V. 

Hq is trivially true since X is the only vertex of Mo and X' is the root of N. 
Claim 1. Assume Hj and the procedure has not terminated. We show Pj+i- 
If 2a or 2b applies, then Claim 1 is immediate. Hence we assume that 2c 

applies and there is a vertex C of Mj containing at least three points which has 

not been checked off. Compute ProperTr(C) and MaxProperCh(C) as above. 

By Hj, there exists vertex C of TV such that C = d(C, N). It suffices to show 

that 

(a) for each child Y of C in N, D := d(Y, N) lies in MaxProperCh(C); and 

(b) each member of MaxProperCh(C) consists of a cluster D for which there 
exists a child E of C in N such that D = d(E, N). 

We first prove (a): 
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Claim la. Let Y be a child of C in N. Then D = cl(Y, N) is a member of 
MaxProperCh(C). 

Since C is a vertex in Mj, there exists by Hj a directed path r — P n , P n -i, 
P n -2, ■ ■ ■ , Pi, Po = C in N from r to C such that for i = 0, • • • , n, d(P, A 7 ) 
is a vertex of M and for i = 1, • • • , n, (cl(Pi, N),cl(Pi-i,N)) is an arc of M. 
(This is because C occurred in M as a child of some vertex, which occurred in 
M as a child of some other vertex, etc.) 

By Lemma 3.3, since Y is a child of C in N, there exists a tree T in 
Tr(N) that contains the directed path r = Q n , Q n -i, • • • ,Qo,Q-i for which 
cl{Qi,T) = cl(Pi,N), cl(Q ,T) = cl(C',N) = C and d(Q_ 1; T) = cZ(y,JV) = 
D. By ffj, T G ProperTr(C), so it follows that D = cl(Y,N) G ProperCh(C). 

I claim that D G MaxProperCh(C) . Otherwise, there exists a tree T in 
ProperTr(C) with vertex C* such that C = cl(C,f), C has child D in T, and 
P G cl(D,f) cC = cl(C,f). Let r = P m , P m -i, -, Po = C be the directed 
path from the root r to (7 in T. By construction, for i = 0, • ■ ■ , m, cl(Pi, T) is a 
member of M and for i = 1, • ■ ■ , m, each arc (d(Pj, T), cZ(Pj_i, T)) is an arc in 
Mj. By Pj, for each i, cl(Pi,T) is a cluster of iV; i.e., there exists vertex Qi in 
AT such that cl(Qi, N) = cl(Pi, T) and (Qi, Qi-i) is an arc of N. In particular, 
by regularity of N, Qo — C . 

Let p be the parent map that yields T (i.e., T(Np) = T). Note for < i < m 
that Pi is a vertex of both f and A^. Then cl(Q u N) = d{P u f) = d(P h N p ) C 
d(Pj, AT). Since A^ is regular it follows that p < Qi m AT. Since the arcs of A^ 
form a subset of the arcs of N, it follows from P,+i < p in Np that p+i < p 
in A/" as well for i = 0, • • ■ , m — 1. 

Since d(P m , N p ) = X = d(Q m , N) it is clear that P m = Q m . 

Now d(Qrn-i,N) = d{P m - U Np) C d{P m -\,N) C d{P m ,N) = d{Q m ,N) 
[since P m < P TO _i in A 7 ]. By regularity of A 7 it follows that Q m < P m _i < Q m —i 
in A 7 . The arc (Q m ,Q m -i) of N is not redundant, so it follows that either 
Pn-i = Q m or P m _i = Q m -i- But P m _i ^ P m = Q m , so we see that 

Similarly d(Q m -2, N) = d(P m - 2 ,N p ) C d{P m - 2 ,N) C d{P m - U N) = 
d(Q m -i,N) [since P TO _i < P m -2 in A 7 "]. By regularity of A 7 it follows that 
Qm-i < Pn-2 < Qm-2 in A 7 . The arc (Q m -i ) ( 5m-2) of A 7 is not redundant 
since A 7 is regular, so it follows that either P TO _2 = Q m -i ° r Pn-2 = Qm-2- 
But P TO _2 7^ Pn-i = Qm-i, so we see that P m _ 2 = Qm-2- 

In like manner we see that p = Qi for z = m — 3, m — 4, • • • ,0. 

It follows that C = Po = Qo = C . Since F is a child of C" in N we know 
d(y,A0 = D G c/(P,T) = d{t),Np) C d(D,N) C c^P^A 7 ) = d(C',N). It 
follows that C" < l) < y in A 7 . Since the arc (C",y) is nonrcdundant, either 
£> = C" or P = y. But D ^ C" since P is a child of C = C". It follows 
that D = Y. Hence D = cl(Y,N) C d(P,f) C d(D,N) = cl(Y,N), which is 
impossible. This contradiction proves that D = d(Y,N) G MaxProperCh(C). 

Next we prove (b): 

Claim lb. Each member D of MaxProperCh(C) satisfies that there exists 
a child E of C in A 7 such that D = d(E, N). 
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Let D be a member of MaxProperCh(C). Thus there exists a tree f in 
ProperTr(C) with vertex C such that C = d[C, f) and C has child L> in f such 
that cl(D,T) = D. Let r = P TO ,P m _i, • • • , P = C be the directed path from r 
to C in T. By construction, for i = 0, • • ■ , m, d(Pi, T) is a vertex of Mj and for 
i = 1, • • • , to, each arc (d(Pj, T),d(Pi-\,T)) is an arc in Mj. By Hj, for i such 
that < i < to, there exists a vertex Qi of TV such that d(p,T) = cl(Qi,N), 
and for 1 < i < to, (Qi, Qi-i) is an arc of TV. In particular, by regularity of N, 
Qo = C. 

As in the proof of Claim la, we see that p = Qi for i = m, m — 1, ■ • • , 
and (7 = P = Qo = C Since £> is a child of C in N p , it follows that C < D 
in Np, whence C < D in N. Since D ^ C, there exists a child IS of C = C in 
TV such that E < D. Hence D = cl(C,f) = cl(C,N p ) C cl(E,N). By Claim 
la, AT) G MaxProperCh(C). Hence D is not in MaxProperCh(C) unless 
D = cl(E,N), proving Claim lb. 

This completes the proof of Claim 1. 

We now complete the proof of Theorem 3.1. 

We saw above that the procedure terminates, say with M n . By Claim 1, 
H n will be true. In fact, each vertex W of N has been represented in M n in 
the sense that cl(W,N) e V n . To see this, note that there is a directed path 
Po = r, P\, ■ ■ ■ , Pk — W in TV since r is the root of TV. Since X' = r, by 
Claim la it follows that cl(P\,N) is a member of MaxProperCh(X), whence 
by construction cl(Pi,N) e V n , cl(Pi,N)' = P u and (d(r, TV), d(P u TV)) in 
A n . Since d(P\,N)' = Pi and P-i is a child of P\ in TV, by Claim la again it 
follows that d{P 2 ,N) e V n , d(P 2 ,N)' = P 2 , and {d{P u N),d{P 2 ,N)) g A„. 
Repeating the argument we ultimately obtain that d(Pk, TV) = d(W, TV) in V^. 

Since every arc in TV lies on some directed path in TV starting at r hence 
occurs as some arc (Pj,Pj+i) using the notation above, the same argument 
shows that the arc corresponds to the arc (d(Pj, TV), cZ(P, + i, TV)) g A n . Thus 
every vertex and arc of TV has a corresponding vertex and arc in M n . 

There remains only to show that M n has no additional vertices or arcs. By 
Claim lb every vertex which is added at any stage has the form d(E,N) for 
some vertex E of TV. Hence M n has no additional vertices. By claim la, every 
arc in M n corresponds to an arc in TV. 

This completes the proof. 

□ 

4 An example of the reconstruction 

Let TV be the network given in Figure 3. The base-set is X = {1,2,3,4,5,6}. 
The clusters satisfy d(A) = X, d(B) = {1,2,3,4,6}, d{C) = {5,6}, d(D) = 
{1,2,3,6}, d(E) = {2,3}, d(F) = {1,2,6}, d(G) = {1,2,3}, and d(i) = {i} 
for 1 < i < 6. An inspection shows that N is regular. 

There are three hybrid vertices 1, 2, 6, each with indegree 2. Hence there 
are 8 parent maps. Here I will list the displayed trees by telling the parent map 
and the nontrivial clusters of each: 
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Figure 3: A regular network N with X = {1, 2, 3, 4, 5, 6} which will be recon- 
structed from its trees. 



Ti: p(l) = G, p(2) - E, p(6) = F. Clusters {2,3}, {1,2,3}, {1,2,3,6}, 
{1,2,3,4,6}. 

T 2 : p(l) - G, p(2) - E, p(6) = G. Clusters {2, 3}, {1, 2, 3}, {1, 2, 3, 4}, {5, 6}. 
T 3 : p(l) = G, p(2) = F, p(6) = F. Clusters {1, 3}, {2, 6}, {1, 2, 3, 6}, {1, 2, 3, 4, 6}. 
T 4 : p(l) = G, p(2) = F, p(6) = G. Clusters {1, 3}, {1, 2, 3}, {1, 2, 3, 4}, {5, 6}. 
T 5 : p(l) = F, p(2) - F, p(6) - F. Clusters {2, 3}, {1, 6}, {1, 2, 3, 6}, {1, 2, 3, 4, 6}. 
T 6 : p(l) = F, p(2) - F, p(6) = G. Clusters {2, 3}, {1, 2, 3}, {1, 2, 3, 4}, {5, 6}. 
T 7 : p(l) = F, p(2) = F, p(6) = F. Clusters {1, 2, 6}, {1, 2, 3, 6}, {1, 2, 3, 4, 6}. 
T 8 : p(l) = F, p(2) = F, p(6) = G. Clusters {1, 2}, {1, 2, 3}, {1, 2, 3, 4}, {5, 6}. 

We now perform procedure MaximumProperChild. Let M& = (Vk,Ak). 
Initially Vq = {X}. The proper children of X are the children of X in any 
proper tree. All the trees are proper trees for X. Hence ProperCh(X) = 
{{1, 2, 3, 4, 6}, {5}, {1, 2, 3, 4}, {5, 6}}. The maximal proper children are the 
maximal members of ProperCh( X). Hence Max Proper Ch(X) — {{1,2,3,4,6}, 
{5,6}}. These are adjoined to Mo as children of X. Hence Mi = (Vi,Ai) has 
Vi = {X, {1, 2, 3, 4, 6}, {5, 6}} and has arcs (X, {1, 2, 3, 4, 6}) and (X, {5, 6}). 

Let G = {5, 6} in V\. By 2b, the children will be {5} and {6}. Hence M 2 has 
V 2 = {X, {1,2, 3, 4, 6}, {5, 6}, {5}, {6}} and the arcs are those of Mi together 
with ({5, 6}, {5}) and ({5, 6}, {6}). 

Let G = {1, 2, 3, 4, 6}. The proper trees must contain both X and {1, 2, 3, 4, 6}. 
Hence ProperTr(C) = {Ti,T 3 ,T 5 ,T 7 }. The proper children of G are the chil- 
dren of G in one of the proper trees. Hence ProperCh(C) = {{1, 2, 3, 6}, {4}}. 
In this case all proper children are maximal. Hence Ms has V3 = V2 U{{1, 2, 3, 6}, 
{4}} and suitable arcs are also added. 

Let G = {1, 2, 3, 6}. A proper tree must contain G, some parent of G hence 
{1,2,3,4,6}, and X. Thus ProperTr(C) = {T u T 3 , T 5 , T 7 }. The proper chil- 
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dren are the children of C in any of these proper trees, so ProperC'h(C) = 
{{1, 2, 3}, {6}, {1, 3}, {2, 6}, {1, 6}, {2, 3}, {1, 2, 6}, {3}}. Then MaxProperCh(C) 
= {{1, 2, 3}, {1, 2, 6}}. These are adjoined, so V 4 = V 3 U {{1, 2, 3}, {1, 2, 6}} and 
arcs are inserted so that these are the children in M4 of {1, 2, 3, 6}. 

Let C = {1, 2, 3}. A proper tree must contain {1, 2, 3}, {1, 2, 3, 6}, {1, 2, 3, 4, 6}, 
and X. Hence ProperTr(C) = {T^Tq}. Then ProperCh(C) = {{1},{2,3}} = 
MaxProperCh(C). Now V 5 = V A U {{1}, {2, 3}}. 

Let C = {1, 2, 6}. A proper tree must contain {1, 2, 6}, {1, 2, 3, 6}, {1, 2, 3, 4, 6}, 
and X. Hence ProperTr(C) = {T 7 } It follows that ProperCh(C) = {{1}, 
{2}, {6}} = MaxProperCh(C). Now Ve = V 5 U {{1}, {2}, {6}}. Note that {6} 
was already in V5, but it is at this stage that we obtain the arc ({1, 2, 6}, {6}). 

Let C = {2,3}. By 2b the children will be {2} and {3}. Hence V 7 = 
T/ 6 U{{2},{3}}. 

The procedure terminates now with M7. Note that V7 now consists of exactly 
the sets cl(U, N) where U is a vertex of N. Similarly the arcs of M7 consist 
exactly of the arcs (cl(U, N), d(W, N)) such that (U, W) is an arc of N. Thus 
M7 is isomorphic with N; indeed, it is the cover digraph of N. 

It is natural to wish that the identification of the children could be simplified, 
for example by merely looking at the maximal children of C in any input tree T 
rather than insisting on proper trees for C. This alternative approach, however, 
fails on this example. If we did not insist on proper trees, then {1} is not a 
maximal child of {1,2,3} since T§ contains {1,2,3} with the child {1,2}. Our 
procedure works since T s is not a proper tree for {1, 2, 3} because the parent of 
{1, 2, 3} in Ts is {1, 2, 3, 4} which had not been identified as a cluster in N. 

5 Discussion 

The main result in this paper is that, if N = (V, A, r, X) is a regular network, 
then the procedure MaximumPropcrChild will reconstruct N from the collection 
Tr{N) of all trees displayed by N. The definition of regularity has two parts: 

(1) cIn ■ V — > V(X) is one-to-one; and 

(2) there is an arc (u, v) G A iff cljy(v) C cIn(u) and there is no w £ V such 
that cIn(v) C cIn(w) c cIn(u). 

The network N in Figure 1 is not regular because (1) fails, and the method 
fails to reconstruct N. Both M and N in Figure 1 have the same displayed 
trees. The procedure of course reconstructs M since M is regular. 

Figure 4 shows two networks A and B that are not regular. Even though (1) 
holds for both, (2) fails. It can be seen that they display exactly the same trees. 
Indeed, using Newick notation, they both display (4, (3, (1,2))), (4, (1, (2,3))), 
and (4, (1, 2, 3)) but not (4, (2, (1, 3))). Hence the conclusion of Cor 4.2 fails for 
these networks when just the second condition of regularity fails. Curiously, one 
easily checks that there is no regular network N that displays these three trees 
and no others. 

In the example of section 4, it is easy to see that N would be reconstructed 
using procedure MaximumProperChild given the input T> — {Ti,T2, T7}. Thus, 
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reconstruction may be possible even if T> ^ Tr{N). On the other hand, in the 
same example if T> = {T\,T2,T^}, then N is not reconstructed. It would be 
interesting to characterize T> C Tr(N) for which reconstruction of N is possible 
using MaximumProperChild with input V. 
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